Plant genome evolution in the genus Eucalyptus is driven by structural rearrangements that promote sequence divergence

Genomes have a highly organized architecture (nonrandom organization of functional and nonfunctional genetic elements within chromosomes) that is essential for many biological functions, particularly gene expression and reproduction. Despite the need to conserve genome architecture, a high level of structural variation has been observed within species. As species separate and diverge, genome architecture also diverges, becoming increasingly poorly conserved as divergence time increases. However, within plant genomes, the processes of genome architecture divergence are not well described. Here we use long-read sequencing and de novo assembly of 33 phylogenetically diverse, wild and naturally evolving Eucalyptus species, covering 1–50 million years of diverging genome evolution to measure genome architectural conservation and describe architectural divergence. The investigation of these genomes revealed that following lineage divergence, genome architecture is highly fragmented by rearrangements. As genomes continue to diverge, the accumulation of mutations and the subsequent divergence beyond recognition of rearrangements become the primary driver of genome divergence. The loss of syntenic regions also contribute to genome divergence but at a slower pace than that of rearrangements. We hypothesize that duplications and translocations are potentially the greatest contributors to Eucalyptus genome divergence.


Assessment of Scaffolding
Scaffolding of all genomes within this study used homology to E. grandis (Myburg et al., 2014), the only available scaffolding information.Using this external and non-species specific data source for scaffolding can potentially introduce false positive and false negative rearrangements, altering synteny.To assess potential impacts the scaffolding method had on our results we scaffolded E. melliodora using individual-specific Hi-C and reanalysed our genome alignments and annotations.
Hi-C sequencing of E. melliodora generated 45.48 Gbp in 151,590,503 paired reads, giving an estimated genome coverage of 71.14x.After aligning Hi-C reads to E. melliodora contigs and identifying PCR duplicates 18,507,548 (12.21%) of read pairs were found to contain linkage information.Further examination showed that 9,612,532 (6.34%) read pairs spanned contigs, and 8,895,016 (5.87%) of read pairs were contained within a single contig.Non-informative reads were either chimeric, unmapped, PCR duplicates, or had low mapping quality (MAPQ < 30, mostly due to multi-mapping).Using 3D-DNA, the E. melliodora contigs were scaffolded, placing 97.60% of its genome within 11 scaffolds, Figure 1.
The vast majority of contigs between the two differentially scaffolded E. melliodora genomes were found to be identically grouped, ordered, and orientated within the 11 scaffolds, Figure 2.However, a small number of contigs, while correctly grouped and ordered, are incorrectly orientated.Similarly, a small number of contigs, while correctly grouped and orientated, are incorrectly ordered within chromosomes.
To examine the potential impact the incorrectly orientated and ordered contigs had upon our results we repeated our comparative phylogenetic analysis.All other Eucalyptus genomes were aligned to Hi-C E. melliodora and both genomes annotated for synteny, rearrangements, and unaligned regions giving 64 annotated genomes, as per our main methods.The proportion of synteny, unaligned, inverted, translocated, and duplicated was plotted against the genomes' phylogenetic distance and regressions calculated.These results were compared to those obtained homology scaffolded E. melliodora, Figure 3 and Table 1.Comparing the syntenic trendlines, homology scaffolding was found to inflate the proportion of syntenic content, however the rate of synteny loss as phylogenetic distance increases was nearly identical.The unaligned genome proportion and rate of accumulation was inflated using homology scaffolding, however unaligned still significantly rose as phylogenetic distance increased.Hi-C left more contigs unplaced than did homology scaffolding, reducing unaligned within the Hi-C genome.Inversions were more frequent within the Hi-C genome, however they still occupied a minority of the genome, and neither increased or decreased as phylogenetic distance increased.Translocations occupied a similar proportion of genomes, however the rate of loss was not found to be significant when Hi-C scaffolded.Finally, duplications gave a very similar result and were significantly lost using both scaffolding methods.
In summary, homology scaffolding against E. grandis retained the majority of genome architecture.Syntenic, rearranged, and unaligned regressions were raised and lowered, but only translocations saw a loss of significance.Translocations may not be contributing to genome divergence as significantly as homology scaffolding predicted.However, in this comparison the phylogenetic distance between E. grandis and E. melliodora is among our largest.Performing this analysis using E. melliodora is almost our worst-case scenario, 24 of our genomes will be less diverged, and only 8 more diverged.The proportion of both Eucalyptus genomes within an alignment pair that was identified as syntenic, unaligned, inverted, translocated, or duplicated plotted against the phylogenetic distance of the two genomes.The unaligned proportion is the species-specific fraction of the genome between genome pairs, resulting from either an insertion, deletion, differential inheritance, or sequence divergence.When combined, the proportion of sequence that is syntenic, unaligned, and rearranged equals 100% for each genome within an alignment pair.The rearranged fraction is further broken down into inverted, translocated, and duplicated regions.Phylogenetic distance was calculated as the sum of branch lengths between each genome pair within phylogeny.P-value tests if the slope of the regression line is nonzero.

Figure 2 .
Figure 2. Dot plot showing sequence homology between E. melliodora scaffolded using Hi-C and scaffolded using homology to E. grandis.The majority of contigs are identically grouped, oriented, and ordered within scaffolds.However, several inversions can be observed along with a smaller number of translocations.

Figure 3 .
Figure 3. Pairwise genome conservation and loss, as phylogenetic distance increases.The proportion of both Eucalyptus genomes within an alignment pair that was identified as syntenic, unaligned, inverted, translocated, or duplicated plotted against the phylogenetic distance of the two genomes.The unaligned proportion is the species-specific fraction of the genome between genome pairs, resulting from either an insertion, deletion,